{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "07baa7d5-89a4-4b22-9dc2-7e43ec2ff9a9",
   "metadata": {},
   "source": [
    "# Text-to-Video retrieval with S3D MIL-NCE and OpenVINO\n",
    "\n",
    "This tutorial based on [the TensorFlow tutorial](https://www.tensorflow.org/hub/tutorials/text_to_video_retrieval_with_s3d_milnce) that demonstrates how to use the [S3D MIL-NCE](https://tfhub.dev/deepmind/mil-nce/s3d/1) model from TensorFlow Hub to do text-to-video retrieval to find the most similar videos for a given text query.\n",
    "\n",
    "MIL-NCE inherits from Multiple Instance Learning (MIL) and Noise Contrastive Estimation (NCE). The method is capable of addressing visually misaligned narrations from uncurated instructional videos. Two model variations are available with different 3D CNN backbones: I3D and S3D. In this tutorial we use S3D variation. More details about the training and the model can be found in [End-to-End Learning of Visual Representations from Uncurated Instructional Videos](https://arxiv.org/abs/1912.06430) paper.\n",
    "\n",
    "This tutorial demonstrates step-by-step instructions on how to run and optimize S3D MIL-NCE model with OpenVINO. An additional part demonstrates how to run quantization with [NNCF](https://github.com/openvinotoolkit/nncf/) to speed up the inference.\n",
    "\n",
    "The tutorial consists of the following steps:\n",
    "\n",
    "#### Table of contents:\n",
    "- [Prerequisites](#Prerequisites)\n",
    "- [The original inference](#The-original-inference)\n",
    "- [Convert the model to OpenVINO IR](#Convert-the-model-to-OpenVINO-IR)\n",
    "- [Compiling models](#Compiling-models)\n",
    "- [Inference](#Inference)\n",
    "- [Optimize model using NNCF Post-training Quantization API](#Optimize-model-using-NNCF-Post-training-Quantization-API)\n",
    "    - [Prepare dataset](#Prepare-dataset)\n",
    "    - [Perform model quantization](#Perform-model-quantization)\n",
    "- [Run quantized model inference](#Run-quantized-model-inference)\n",
    "### Installation Instructions\n",
    "\n",
    "This is a self-contained example that relies solely on its own code.\n",
    "\n",
    "We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.\n",
    "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n",
    "\n",
    "<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/s3d-mil-nce-text-to-video-retrieval/s3d-mil-nce-text-to-video-retrieval.ipynb\" />\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3ac784f8-5511-4631-ad4c-d3dd816f9d07",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0153bf4b-94e1-44a8-99e6-4a2433b29c23",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "\n",
    "from pathlib import Path\n",
    "\n",
    "os.environ[\"TFHUB_CACHE_DIR\"] = str(Path(\"./tfhub_modules\").resolve())\n",
    "\n",
    "if not Path(\"notebook_utils.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
    "    )\n",
    "    open(\"notebook_utils.py\", \"w\").write(r.text)\n",
    "\n",
    "\n",
    "if not Path(\"pip_helper.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n",
    "    )\n",
    "    open(\"pip_helper.py\", \"w\").write(r.text)\n",
    "\n",
    "from pip_helper import pip_install\n",
    "\n",
    "\n",
    "pip_install(\"openvino-tokenizers\", \"openvino>=2025.2.0\")\n",
    "pip_install(\"-q\", \"tensorflow==2.15.1\", \"tf_keras==2.15.1\")\n",
    "\n",
    "pip_install(\"-q\", \"tensorflow_hub==0.16.1\", \"--no-deps\")\n",
    "pip_install(\"-q\", \"opencv-python\", \"nncf>=2.17.0\", \"numpy<2\")\n",
    "pip_install(\"-q\", \"matplotlib>=3.4\")\n",
    "\n",
    "\n",
    "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
    "from notebook_utils import collect_telemetry\n",
    "\n",
    "collect_telemetry(\"s3d-mil-nce-text-to-video-retrieval.ipynb\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b6b3c5db-ae42-45a0-ada0-7f1319221621",
   "metadata": {},
   "source": [
    "Download the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4dea4952-780b-42e1-a973-c45fd4b8508e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import tensorflow_hub as hub\n",
    "\n",
    "import numpy as np\n",
    "from IPython import display\n",
    "\n",
    "\n",
    "hub_handle = \"https://www.kaggle.com/models/deepmind/mil-nce/TensorFlow1/s3d/1\"\n",
    "hub_model = hub.load(hub_handle)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8a962845-20cf-454a-b796-59dc7891ebcf",
   "metadata": {},
   "source": [
    "The model has 2 signatures, one for generating video embeddings and one for generating text embeddings. We will use these embedding to find the nearest neighbors in the embedding space as in the original tutorial. Below we will define auxiliary functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "74e1ea65-7bae-4a49-a520-8e5898adbbce",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_embeddings(model, input_frames, input_words):\n",
    "    \"\"\"Generate embeddings from the model from video frames and input words.\"\"\"\n",
    "    # Input_frames must be normalized in [0, 1] and of the shape Batch x T x H x W x 3\n",
    "    vision_output = model.signatures[\"video\"](tf.constant(tf.cast(input_frames, dtype=tf.float32)))\n",
    "    text_output = model.signatures[\"text\"](tf.constant(input_words))\n",
    "\n",
    "    return vision_output[\"video_embedding\"], text_output[\"text_embedding\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09e0047d-6fb8-4b18-a66d-e8e8a37154ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th>Video 1</th><th>Video 2</th><th>Video 3</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td></tr></table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from s3d_mil_nce_helper import load_video, download_with_headers, display_query_and_results_video\n",
    "\n",
    "\n",
    "video_1_url = \"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\"\n",
    "video_2_url = \"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\"\n",
    "video_3_url = \"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\"\n",
    "\n",
    "video_1_path = download_with_headers(video_1_url, \"data/YosriAirTerjun.gif\")\n",
    "video_2_path = download_with_headers(video_2_url, \"data/Guitar_solo_gif.gif\")\n",
    "video_3_path = download_with_headers(video_3_url, \"data/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\")\n",
    "\n",
    "video_1 = load_video(str(video_1_path))\n",
    "video_2 = load_video(str(video_2_path))\n",
    "video_3 = load_video(str(video_3_path))\n",
    "all_videos = [video_1, video_2, video_3]\n",
    "\n",
    "query_1_video = \"waterfall\"  # @param {type:\"string\"}\n",
    "query_2_video = \"playing guitar\"  # @param {type:\"string\"}\n",
    "query_3_video = \"car drifting\"  # @param {type:\"string\"}\n",
    "all_queries_video = [query_1_video, query_2_video, query_3_video]\n",
    "all_videos_urls = [video_1_url, video_2_url, video_3_url]\n",
    "\n",
    "\n",
    "def display_video(urls):\n",
    "    html = \"<table>\"\n",
    "    html += \"<tr><th>Video 1</th><th>Video 2</th><th>Video 3</th></tr><tr>\"\n",
    "    for url in urls:\n",
    "        html += \"<td>\"\n",
    "        html += '<img src=\"{}\" height=\"224\">'.format(url)\n",
    "        html += \"</td>\"\n",
    "    html += \"</tr></table>\"\n",
    "    return display.HTML(html)\n",
    "\n",
    "\n",
    "display_video(all_videos_urls)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5b91a14a-d565-4526-bd81-7a6c3e15c235",
   "metadata": {},
   "source": [
    "## The original inference\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0563060f-0ff4-428f-8ac6-9f7b240e570f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare video inputs.\n",
    "videos_np = np.stack(all_videos, axis=0)\n",
    "\n",
    "# Prepare text input.\n",
    "words_np = np.array(all_queries_video)\n",
    "\n",
    "# Generate the video and text embeddings.\n",
    "video_embd, text_embd = generate_embeddings(hub_model, videos_np, words_np)\n",
    "\n",
    "# Scores between video and text is computed by dot products.\n",
    "all_scores = np.dot(text_embd, tf.transpose(video_embd))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5fac7925-676e-46b5-b48b-d2b3397d51de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h2>Input query: <i>waterfall</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:4.71</th><th>Rank #2, Score:-1.63</th><th>Rank #3, Score:-4.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>playing guitar</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:6.50</th><th>Rank #2, Score:-1.79</th><th>Rank #3, Score:-2.67</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>car drifting</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:8.78</th><th>Rank #2, Score:-1.07</th><th>Rank #3, Score:-2.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display results.\n",
    "html = \"\"\n",
    "for i, words in enumerate(words_np):\n",
    "    html += display_query_and_results_video(words, all_videos_urls, all_scores[i, :])\n",
    "    html += \"<br>\"\n",
    "display.HTML(html)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "67a0373a-ee9e-4728-97b2-355d1ed501a3",
   "metadata": {},
   "source": [
    "## Convert the model to OpenVINO IR\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "OpenVINO supports TensorFlow models via conversion into Intermediate Representation (IR) format. We need to provide a model object, input data for model tracing to `ov.convert_model` function to obtain OpenVINO `ov.Model` object instance. Model can be saved on disk for next deployment using `ov.save_model` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e1f51b7b-a215-44af-90be-83eb9d2cfd43",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openvino_tokenizers  # NOQA Need to import conversion and operation extensions\n",
    "import openvino as ov\n",
    "\n",
    "model_path = hub.resolve(hub_handle)\n",
    "# infer on random data\n",
    "images_data = np.random.rand(3, 32, 224, 224, 3).astype(np.float32)\n",
    "words_data = np.array([\"First sentence\", \"Second one\", \"Abracadabra\"], dtype=str)\n",
    "\n",
    "ov_model = ov.convert_model(model_path, input=[(\"words\", [3]), (\"images\", [3, 32, 224, 224, 3])])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "62128306-2021-41eb-a965-2494fa8abbf2",
   "metadata": {},
   "source": [
    "## Compiling models\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Only CPU is supported for this model due to strings as input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c3306fcc-bcfc-41f9-adbb-79afe259e4cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "core = ov.Core()\n",
    "\n",
    "compiled_model = core.compile_model(ov_model, device_name=\"CPU\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "632d8390-36b8-4e46-9226-2546d3be678b",
   "metadata": {},
   "source": [
    "## Inference\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0e653b36-2c45-4609-b24f-64401ed0ca13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Redefine `generate_embeddings` function to make it possible to use the compile IR model.\n",
    "def generate_embeddings(model, input_frames, input_words):\n",
    "    \"\"\"Generate embeddings from the model from video frames and input words.\"\"\"\n",
    "    # Input_frames must be normalized in [0, 1] and of the shape Batch x T x H x W x 3\n",
    "    output = compiled_model({\"words\": input_words, \"images\": tf.cast(input_frames, dtype=tf.float32)})\n",
    "\n",
    "    return output[\"video_embedding\"], output[\"text_embedding\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a28a44c9-9c1a-427f-b7a8-eab007d0f4e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the video and text embeddings.\n",
    "video_embd, text_embd = generate_embeddings(compiled_model, videos_np, words_np)\n",
    "\n",
    "# Scores between video and text is computed by dot products.\n",
    "all_scores = np.dot(text_embd, tf.transpose(video_embd))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "545a657b-849f-454a-a8c9-17f1a653f3ae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h2>Input query: <i>waterfall</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:4.71</th><th>Rank #2, Score:-1.63</th><th>Rank #3, Score:-4.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>playing guitar</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:6.50</th><th>Rank #2, Score:-1.79</th><th>Rank #3, Score:-2.67</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>car drifting</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:8.78</th><th>Rank #2, Score:-1.07</th><th>Rank #3, Score:-2.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display results.\n",
    "html = \"\"\n",
    "for i, words in enumerate(words_np):\n",
    "    html += display_query_and_results_video(words, all_videos_urls, all_scores[i, :])\n",
    "    html += \"<br>\"\n",
    "display.HTML(html)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "947cdd77-4e5c-4dbe-ac46-3581bde15b47",
   "metadata": {},
   "source": [
    "## Optimize model using NNCF Post-training Quantization API\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "[NNCF](https://github.com/openvinotoolkit/nncf) provides a suite of advanced algorithms for Neural Networks inference optimization in OpenVINO with minimal accuracy drop.\n",
    "We will use 8-bit quantization in post-training mode (without the fine-tuning pipeline).\n",
    "The optimization process contains the following steps:\n",
    "\n",
    "1. Create a Dataset for quantization.\n",
    "2. Run `nncf.quantize` for getting an optimized model.\n",
    "3. Serialize an OpenVINO IR model, using the `ov.save_model` function."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "378312a9-3d90-40e4-b5fa-8de2990556dc",
   "metadata": {},
   "source": [
    "### Prepare dataset\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "This model doesn't require a big dataset for calibration. We will use only example videos for this purpose.\n",
    "NNCF provides `nncf.Dataset` wrapper for using native framework dataloaders in quantization pipeline. Additionally, we specify transform function that will be responsible for preparing input data in model expected format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ec4eae46-f5b0-41b8-83f6-709fc56aaaa4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, openvino\n"
     ]
    }
   ],
   "source": [
    "import nncf\n",
    "\n",
    "dataset = nncf.Dataset(((words_np, tf.cast(videos_np, dtype=tf.float32)),))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a76fa7a5-bc18-456c-b2aa-326d9eae7af3",
   "metadata": {},
   "source": [
    "### Perform model quantization\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "The `nncf.quantize` function provides an interface for model quantization. It requires an instance of the OpenVINO Model and quantization dataset. \n",
    "Optionally, some additional parameters for the configuration quantization process (number of samples for quantization, preset, ignored scope etc.) can be provided."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "6e594e8d-0ec5-4c46-a971-93b4aee3ba05",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "92afc81f874c4772b0175868f30029fb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b41bcd758716415c8a94569fc79638b5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:nncf:39 ignored nodes were found by name in the NNCFGraph\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "1a4302b5a20c4cb5b40ca2c61056c557",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "38ca42ba415049ab963962b9b7e10bdc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "MODEL_DIR = Path(\"model/\")\n",
    "MODEL_DIR.mkdir(exist_ok=True)\n",
    "\n",
    "quantized_model_path = MODEL_DIR / \"quantized_model.xml\"\n",
    "\n",
    "\n",
    "if not quantized_model_path.exists():\n",
    "    quantized_model = nncf.quantize(model=ov_model, calibration_dataset=dataset, model_type=nncf.ModelType.TRANSFORMER)\n",
    "    ov.save_model(quantized_model, quantized_model_path)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e850db35-9094-4294-aa4e-b5017a80e1f5",
   "metadata": {},
   "source": [
    "## Run quantized model inference\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "There are no changes in model usage after applying quantization. Let's check the model work on the previously used example. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "aa59c47e-7d11-405e-a7e3-10d2054401ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "int8_model = core.compile_model(quantized_model_path, device_name=\"CPU\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "3d697fcd-d283-40af-bdd3-ffbf6f0a15e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the video and text embeddings.\n",
    "video_embd, text_embd = generate_embeddings(int8_model, videos_np, words_np)\n",
    "\n",
    "# Scores between video and text is computed by dot products.\n",
    "all_scores = np.dot(text_embd, tf.transpose(video_embd))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0e944f55-a6be-4ab6-98e3-369e8f2ce10e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h2>Input query: <i>waterfall</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:4.71</th><th>Rank #2, Score:-1.63</th><th>Rank #3, Score:-4.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>playing guitar</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:6.50</th><th>Rank #2, Score:-1.79</th><th>Rank #3, Score:-2.67</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td></tr></table><br><h2>Input query: <i>car drifting</i> </h2><div>Results: <div><table><tr><th>Rank #1, Score:8.78</th><th>Rank #2, Score:-1.07</th><th>Rank #3, Score:-2.17</th></tr><tr><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/3/30/2009-08-16-autodrift-by-RalfR-gif-by-wau.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/b/b0/YosriAirTerjun.gif\" height=\"224\"></td><td><img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e6/Guitar_solo_gif.gif\" height=\"224\"></td></tr></table><br>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display results.\n",
    "html = \"\"\n",
    "for i, words in enumerate(words_np):\n",
    "    html += display_query_and_results_video(words, all_videos_urls, all_scores[i, :])\n",
    "    html += \"<br>\"\n",
    "display.HTML(html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "openvino_notebooks": {
   "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/76171391/ba516a81-f6f7-4258-9e3b-931d6db7728c",
   "tags": {
    "categories": [
     "Model Demos"
    ],
    "libraries": [],
    "other": [],
    "tasks": [
     "Text-to-Video Retrieval"
    ]
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
